Information Systems Education Journal (ISEDJ) 
ISSN: 1545-679X 


13(1) 
January 2015 


A Proposed Concentration Curriculum Design for 
Big Data Analytics for Information Systems 

Students 


John C. Molluzzo 
jmolluzzo@pace.edu 

James P. Lawler 
jlawler@pace.edu 

Pace University 
One Pace Plaza 
New York, New York 10038 

Abstract 

Big Data is becoming a critical component of the Information Systems curriculum. Educators are 
enhancing gradually the concentration curriculum for Big Data in schools of computer science and 
information systems. This paper proposes a creative curriculum design for Big Data Analytics for a 
program at a major metropolitan university. The design emphasizes expanded learning of business, 
mathematical and statistical, and presentation skills, in projects of teams, in addition to skills in 
technology. This paper will be beneficial to educators considering improvement of the curriculum for 
Big Data Analytics and to students desiring a more contemporary program. 
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1. BACKGROUND AND DEFINITION 

"Data is the New Oil" (Smolan and Erwitt, 2012). 

Big Data is defined as "bigger and bigger and 
bigger" (Aiden, & Michel, 2013) aggregates of 
data that challenge business firms in analyzing 
business benefit with common software. Big 
Data dimensions are defined in a diverse variety 
of structured data, such as traditional 
transaction data, and non-structured data, such 
as mobile sensor and social media networking 
data; in a velocity as to rapid sensitivity to real 
time timeliness of the data; in a veracity as to 
the purity of sizable volume; and in sheer 
streaming volume (Ohlhorst, 2013). Big Data is 
essentially a data management paradigm shift 
(Borkovich, & Noah, 2013). Big Data is 
estimated to be in dozens of terabytes to 


multiples of petabytes (IBM, 2014), growing 
50% each year and 100% every 2 years in 
business firms (Lohr, 2012); and is estimated to 
be further impacted by increased information 
from the "internet of things" (Morozov, 2014), 
such as consumer wearables (Minsker, 2014a). 
Firms in the retail industry, such as Walmart, 
store 2.5 petabytes or 1 quadrillion bytes of data 
(McAfee, & Brynjolfsson, 2012). The storing of 
data however is less important than the business 
benefit to be acquired from the analysis of the 
data, in cost control, decision improvement and 
design improvements in processes, products and 
services (Davenport, 2014), especially from the 
cross-fertilization of customer social networking 
and transaction data streaming into firms 
(Brustein, 2014). The accessibility of such data 
is apparently a "big deal" (eWeek, 2013), as 
firms exploit the potential of data analytics in 
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this perceived revolution of technology 
(Freeland, 2012). 

The benefits of Big Data Analytics (henceforth 
referred to as BDA) are cited frequently in firms 
(IBM, 2014). Amazon is analyzing data for 
competitive customer micro-segmentation of 
products to customize its products and services; 
Google is analyzing messaging to improve its 
services (Rosenblatt, 2014); and Tesco and 
Walmart are analyzing demographics to lower 
inventory pricing of products and services at 
their stores. Twitter is analyzing hashtags for 
more patterns of potential sales from subject 
trends. Firms are clearly interested in BDA to 
optimize the outcomes of processes, products 
and services. Even the government and the 
health industry (Kim, Trimi, & Chung, 2014) are 
commencing initiatives in efficiency, decision 
improvement and cost control from BDA to 
optimize the outcomes of processes and services 
(Liyakasa, 2013). Though firms may be storing 
increased data without increased insight 
(Minsker, 2014b), the potential of BDA as a 
profitable attribute, beyond the benefits of 
Business Intelligence and Operations Research, 
is evident in the literature (King, 2014). This 
potential invites consideration of BDA as a 
differential feature of learning in schools of 
computer science and information systems. 

Graduates from schools of computer science and 
information systems can contribute to the field 
of data analytics if the curricula of the schools 
include BDA. Though firms are investing in BDA, 
they do not have enough data scientists or 
specialists (May, 2013) for extracting the 
potential of their data (McCafferty, 2013). 
Graduates can contribute to the field if they 
have analytical business skills (Janicki, 
Cummings, & Kline, 2013) and content domain 
expertise skills (Poremba, 2013) to critically 
evaluate the business (Pratt, 2013) of Big Data. 
They can contribute to this evaluation if they 
have computational mathematical and statistical 
skills (Flulme, 2013), can interpret in a high 
performance environment the complex event 
significances of structured and non-structured 
data, and evaluate potential problem solutions 
or proposed strategies (Pratt, 2013). They can 
contribute further if they have privacy and 
security sensitivity skills in standards for Big 
Data housed by organizations, especially given 
intrusion issues as discussed in the literature 
(Lohr, 2013, Sengupta, 2013, & Angwin, 2014). 
Moreover, they can contribute notably to the 
field if they have persuasive presentation skills 
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(Miller, 2013), as individual contributors or 
members of teams, in proposing solutions and 
strategies; and they can contribute powerfully if 
they have skills in visualization (Rao, & Flalter, 

2013) . These skills are beyond the data base 
analysis, design and development skills in 
technology (King, 2013). Though few 
graduates, or even practitioners in industry 
(MacSweeney, 2013a), have all of these skills to 
be data scientists, a creative curriculum design 
in Data Science may improve the breadth of 
ensemble learning for students currently 
enrolled in schools of computer science and 
information systems. 

2. FOCUS OF PAPER 

"The Big Data [Revolution holds the promise of 
empowering all of us with knowledge" (Smolan 
and Erwitt, 2012). 

The proposed concentration of Data Science at 
Pace University is the focus of this paper. The 
focus is apt, as firms desire BDA personnel but 
do not have enough expertise for initiatives on 
projects (Davenport, 2014). Many firms have a 
BDA expertise gap (Olavsrud, 2014), despite the 
hype of pundits. The literature indicates that 
industry needs 140.0 - 190.0 thousand BDA 
professionals if not data scientists in 2014 
(Manyika, Chui, Brown, Bughin, Dobbs, 
Roxburgh, & Byers, 2011) and even a high of 
4.4 million scientists in 2015 (IBM, 2014). Even 
though a concentration of Data Science is not 
enough for an immediate solution, the 
convergence is current to the expectations of 
industry and organizations, as they initiate 
investment in Big Data strategies (Messmer, 

2014) . The focus of this paper on the Data 
Science curriculum will benefit educators and 
students desiring a foundation for an 
immediately marketable program. 

3. CONCENTRATION METHODOLOGY - DATA 
SCIENCE 

"Flaving the data is only the beginning "(Smolan 
and Erwitt, 2012). 

Pace University is anticipating beginning a 
concentration in Data Science in 2015 with the 
offering of the Concepts of Big Data Analytics 
course. The concentration covers descriptive, 
predictive and prescriptive analytics (Camm, 
Cochran, Fry, Ohlmann, Anderson, Sweeney, & 
Williams, 2015) for data-driven decision making. 
The concentration is designed for undergraduate 
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students to learn business, mathematical and 
statistical, presentation, team-playing and high- 
level technology skills. In the concentration, 
projects are assigned to incubating pseudo data 
scientist teams (O'Neil, & Schutt, 2014) 
consisting of different skilled students 
(Schmerken, 2013) of 3-5 individuals. The 
projects are focused on the design of processes, 
products or services in a discrete industry, such 
as energy, entertainment, finance, health and 
life sciences, or retail. The projects are to be 
focused on BDA problems in the industries and 
are to be furnished with data sets of a massive 
scale from non-proprietary Web sites and 
systems, such as www.data.gov,www.enigma.io 
(Singer, 2014) and www.openwebanalytics.com. 
The projects are to include internship and 
mentoring of the student teams from a few firms 
in the industry that are partnered with the 
school and even have data analytics 
employment positions (ITBusinessEdge, 2014). 
In 2016-2017 a few boutique data scientist 
firms may be partnered with the school. 
Technologies in the concentration include, but 
are not limited to, Apache, Hadoop, MapReduce, 
NLP for text, NoSQL, Python tools (Knorr, 2013) 
and SAS tools. The concentration in Data 
Science is planned to begin in 2016-2017 after 
the Concepts of Big Data Analytics course, by 
expanded learning of mathematical, statistical 
and technology skills that will involve other 
faculty (King, 2013) in the school. Few schools 
of computer science and information systems 
have curriculum design initiatives in Data 
Science (MacSweeney, 2013b) as envisioned in 
this paper. 

The generic learning objectives of the Data 
Science concentration are defined below: 

• Analyze a business process, product or service 
for experimental improvement in an 
organization that can benefit by BDA; 

• Collaboratively design a discovery and 
exploratory method for interpreting the 
customer data domain dynamics of the 
process, product or service that include 
structured and unstructured data sources; 

• Collaboratively develop a conceptual data 
business model for the process, product or 
service problem and for the solution, infused 
by intelligence learned in the discovery and 
exploratory process and by leveraging of a 
BDA tool(s), integrating a data service process 
prototype scenario(s) - what can the firm do 
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better now with the information that it could 
not do before it had it? (Provost, & Fawcett, 
2013); 

• Collaboratively develop a governance plan for 
the new product or service or a process 
solution for the firm, informed by customer 
data privacy rights and security sensitivity 
standards; and 

• Formulate an organizational production plan 
for integrating the data sources, systems and 
technologies for the proposed process, product 
or service solution and for integrating BDA as 
an overall business strategy. 

The proposed courses in the concentration are 3 
credits. The outcomes of the concentration are 
in analytical business skills, creative problem¬ 
solving skills, Big Data modeling skills, 
fundamental mathematical and statistical skills, 
and presentation and team-playing skills, and 
also privacy and security sensitivity skills. The 
goal of the concentration is for its graduates to 
be business data scientists, not mere scientist 
technologists. The curriculum is a foundation 
from which there may be employment postings 
of BDA specialties for the students upon 
graduation from the university. 

Pending approval by an internal curriculum 
committee of the school, the concentration of 
Data Science will begin in fall 2015 with the 
offering of Concepts of Big Data Analytics, as 
discussed in this paper. The plan is to rollout the 
full concentration during 2015-2017 with the 
following courses. 

Concept Courses 

• Concepts of Big Data Analytics, a course on 
critical Big Data modeling of a process, 
product or service in industry; 

• Big Data Maturity Model, a conceptual course 
on benchmarking of best Big Data 
organizational practices in industry; and 

• Customer Relationship Management (CRM) 
and Big Data, a conceptual course integrating 
BDA and household priority relationship 
strategy in industry. 

Domain Courses 

• Big Data Analytics in Energy, a domain course 
integrating BDA projects for decision-making 
in the energy industry; 

• Big Data Analytics in Entertainment, a domain 
course integrating BDA projects for decision- 
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making in the entertainment and sports 
industries; 

• Big Data Analytics in the Financial Industry, a 
domain course integrating BDA projects for 
decision-making in the international financial 
services industry; 

• Big Data Analytics in Health and Life Sciences, 
a domain course integrating BDA projects for 
decision-making in the health and life sciences 
industry, including ObamaCare initiatives; and 

• Big Data Analytics in Retail Industries, a 
domain course integrating BDA projects for 
decision-making in the retail industries. 

Enabling Courses 

• Big Data Ethical Framework, an integrative 
course on BDA privacy, regulatory and security 
standards governing analytics professionals in 
industries and organizations; and 

• Big Data Foundation Technology, an 
integrative course on required BDA high 
performance infrastructure platform and 
storage technologies and tools. 

The concentration of Data Science is depicted in 
Figure 1. The concentration is fulfilled in the 
three conceptual courses, three of the five 
domain courses, and the two enabling courses of 
the plan. The concentration is currently designed 
for the undergraduate students of the school but 
may be expanded in 2017 for graduate students 
of the School and of the School of Business of 
the university. 

Table 1 lists courses that can give the student 
requisite skills in business, mathematics, 
statistics, and presentation, team-playing and 
high-level technology. The descriptions are fairly 
generic and reflect existing courses in most 
institutions that have a business major. 

4. COURSE MODEL - BIG DATA ANALYTICS 

" ... Big Data is much more than big data" 
(Smolan and Erwitt, 2012). 

The field of data science or data analytics is 
relatively new, with few consistencies in the 
content or names of introductory courses. 

During January-February, 2014, a scan of the 
Internet disclosed the following names for 
introductory courses: 

• Advanced Big Data Analytics 

• Analytics and Decision Analysis 

• Applied Data Science 
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• Big Data Analytics 

• Business Analytics 

• Business Intelligence and Analytics 

• Data Analytics 

• Data Analytics for Information Systems 

• Data and Decision Analytics 

• Data Warehousing and Analytics 

• Elements of Data Analysis 

• Introduction to Business Analytics 

• Introduction to Data Analytics 

• Introduction to Data Science 

• Large-scale Data Analysis 

For this paper, the Concepts of Big Data 
Analytics course is outlined in Table 2 of the 
Appendix. 

Table 2 contains five columns corresponding to 
the online syllabi of five university introductory 
Data Science or Analytics courses. Over the 
period of February - March 2014, the authors 
reviewed the online syllabi of 21 introductory 
courses that contained Data Analytics or Data 
Science in their titles, all from Tier I and Tier II 
universities. The courses represented in Table 2 
are a representative sample of the 21 courses. 
The five columns can be used to compare the 
Concepts of Big Data Analytics course of this 
paper to those at these other universities. Note 
that the omission of a checkmark does not mean 
the topic is not covered in that course. The 
checkmarks indicate what was available on the 
Web sites of the universities. Table 2 does not 
name the universities corresponding to the five 
columns, in order to avoid any criticism of the 
universities - table is for comparison only. 

The Concepts of Big Data Analytics course 
emphasizes the concepts behind modern Data 
Science. The course is conceptual in the sense 
that the principles behind Data Science are 
emphasized rather than the tools with which to 
implement them. Therefore, topics such as R 
and Python programming, Hadoop, MapReduce, 
and so on are not covered to any extent in this 
course. Instead, they are covered, as they are 
needed in the domain courses to solve industry- 
specific problems. Some topics in the course 
require knowledge of probability and statistics. 
Therefore, the basic statistics course required of 
all computing majors is a prerequisite for this 
course. Because the course has no programming 
requirements, it is accessible to any student in 
the university who has the statistics 
prerequisite, which thus includes all business 
majors in the university. 
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The course emphasizes the strategic value of 
data and the use of Data Science teams in an 
organization, and the use of data-driven 
decision-making. It introduces students to data- 
analytic thinking and Data Science principles to 
facilitate communication between business 
stakeholders and the Data Science teams. The 
course also discusses the limitations and pitfalls 
(e.g., overfitting) of Data Science and the 
necessity of human involvement in choosing the 
right data and evaluating the processes and 
results of the Data Science projects. The 
ultimate goal of this course is to enable students 
to participate in the development and proper 
evaluation of a Data Analytics solution to a 
business problem. 

The text will be Provost, F., & Fawcett, T. 
(2013), Data Science for Business: What You 
Need to Know about Data Mining and Data- 
Analytic Thinking. Supporting texts will be 
Davenport, T.FI. (2014), Big Data at Work: 
Dispelling the Myths, Uncovering the 
Opportunities. Alternately the following can be 
used as the text: Davenport, T.FI., & Flarris, J.G. 
(2007), Competing on Analytics: The New 
Science of Winning. The text will supplemented 
by Analysis INFORMS Magazine. 

The course will also discuss in detail several 
recent case studies of the application of BDA to 
real business situations. There are many online 
resources to obtain such cases, (e.g., BDA sites 
of IBM [2014] and HP Vertica [2014]) 

5. IMPLICATIONS 

"Big Data started as a series of small waves but 
is morphing into the greatest tsunami of 
information that humans have ever seen" 
(Smolan and Erwitt, 2012). 

The terms "Big Data", "Data Science", "Data 
Analytics" are so ubiquitous in the practitioner 
press that it seems that they are just the next 
hyped fad that will fade into oblivion in a few 
years. However, with the ability to process and 
store the many kinds of data collected by firms 
and organizations, and the need to use these 
data to strategic advantage, the field of Data 
Science will not disappear soon. Several studies, 
such as Brynjolfsson, Hitt, and Kim (2011), and 
Tambe (2014), have shown that the more data- 
driven a firm, the more successful is the firm. 
Therefore, more and more organizations will be 
hiring data scientists to take advantage of their 
ever-growing store of data. There is, however, a 
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problem - our universities are not keeping up 
with the demand. 

There will be a shortage of talent 
necessary for organizations to take 
advantage of big data. By 2018, the 
United States alone could face a 
shortage of 140,000 to 190,000 people 
with deep analytical skills as well as 
1.5 million managers and analysts with 
the know-how to use the analysis of 
big data to make effective decisions. 
(Manyika, et al, 2011). 

Therefore, it is important for universities to 
begin developing data scientists who have the 
requisite technical skills, and the business and 
content domain knowledge to leverage the data 
that organizations are now accumulating for 
advantage (Gillespie, 2014). The program in 
Data Science proposed in this paper helps to fill 
this need. 

6. LIMITATIONS AND OPPORTUNITIES 

The concentration of Data Science and the 
course on Big Data Analytics are beginning as a 
program in fall 2015, but evaluation of the 
impact of the initial program on the students 
may not be finished until fall 2016. 

The curriculum for Data Science with Big Data 
Analytics is not clearly defined in the literature 
(Dietrich, Newton, & Corley, 2013), and the field 
is immature in instruction. Within the next year, 
the authors plan to survey instructors of 
introductory Data Analytics and Data Science 
courses with a view towards determining which 
topics are essential for such courses, and which 
topics are less so. The authors hope that this 
future research will help create a common core 
of topics for an introductory course in Data 
Analytics/Data Science. 

The curriculum design in this paper furnishes 
important input to instructors in schools of 
computer science and information systems who 
want to have an initial program in tandem with 
trends. The literature indicates BDA as a 
desirable norm in organizations (Ohlhorst, 
2013), an opportunity for the response of 
schools of computer science and information 
systems. The model of this paper provides a 
first step. 
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7. CONCLUSION 

This paper posits a curriculum design of Big Data 
Analytics in the proposed concentration of Data 
Science at Pace University. The design includes 
a discovery and exploratory method of critical 
Big Data modeling and the improvement of a 
process, product or service in industry. The 
design provides for inclusion of an organizational 
plan for process, product or service solutions, 
and a production strategy integrating non- 
traditional and traditional technologies and BDA 
tools. The design further provides privacy rights 
and security sensitivity standards. The design is 
ideal as firms and organizations pursue BDA 
projects. Few organizations have the full 
prerequisite skills for BDA projects and 
strategies. Throughout this paper, the design of 
BDA proposes the relevance of business, 
mathematical and statistical, and presentation 
and team-playing skills, augmenting prerequisite 
skills in the traditional data base management 
technologies and in the new non-traditional BDA 
tools. Overall, this paper provides a beneficial 
proposal to instructors desiring to initiate BDA 
and Data Science programs to be in tandem with 
industrial and organizational trends, and to 
undergraduate students intending to be in 
tandem with technological trends. 
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APPENDIX 


Figure 1: Data Science Concentration 


CONCEPT COURSES 


DOMAIN COURSES 


ENABLING COURSES 


Concepts of Big 
Data Analytics 
Spring 2015 


Big Data Maturity 
Model 

Summer 2015 


Customer 
Relationship 
Management (CRM) 
and Big Data 
Fall 2015 


Big Data Analytics in 
Energy 
Spring 2017 


Big Data Analytics in 
Entertainment 
Summer 2017 


Big Data Analytics in 
Financial Industry 
Spring 2016 


Big Data Analytics in 
Flealth and Sciences 
Summer 2016 


Big Data Analytics in 
Retail Industries 
Fall 2017 


Big Data Ethical 
Framework 
Fall 2016 


Big Data Foundation 
Technology 
Fall 2016 
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Table 1 - Possible Support Courses 

Course 

Description 

Contemporary Business 

Practice 

The functions of business and their interrelationships. Students 
work in teams to run simulated companies. Development of 
business writing and speaking, presentation, and data analysis 
skills are emphasized. 

Calculus I 

Limits, continuity, derivatives of algebraic, exponential and 
logarithmic functions, optimization problems, introduction to 
integral calculus, the fundamental theorem of integral calculus. 
Business and economic applications are stressed throughout. 

Probability and Statistics 

Random processes; finite sample spaces, probability models, 
independent events, and conditional probability. Bayes' theorem, 
random variables, mathematical expectation; statistical 
applications of probability, introduction to sampling theory, 
confidence intervals and hypothesis testing. 

Public Speaking 

The mechanics of writing and presenting one's own material. This 
includes outlining, addressing varied audiences, styles, and 
appropriate techniques of delivery, as well as the use of 
technology to enhance one's presentation. 

Introduction to Computer 
Systems 

The basic components of a computer, how they are organized, and 
how they work together under the control of an operating system. 
Students examine theoretical concepts underlying hardware 
functions, troubleshooting and preventative maintenance 
techniques, safety precautions, system procurement, and 
upgrades, and discuss networking and software as it pertains to 
hardware functionality. 

Financial Accounting 

Accounting's role in satisfying society's needs for information and 
its function in business, government, and the non-profit sector. 
Students learn from a user-oriented perspective about the 
accounting cycle, the nature of financial statements and the 
process for preparing them, and the use of accounting information 
as a basis for decision making. 

Managerial Accounting 

A study of the fundamental managerial accounting concepts and 
techniques that aid in management decision-making, performance 
evaluation, planning and controlling operations. The emphasis is 
on the use of accounting data as a management tool rather than 
on the techniques of data accumulation. The course includes such 
topics as cost behavior patterns, budgeting and cost-volume-profit 
relationships. Quantitative methods applicable to managerial 
accounting are studied. 

Managerial and Organizational 
Concepts 

This course examines basic managerial functions of planning, 
organizing, motivating, leading, and controlling. Emphasis is also 
given to the behavior of individual and groups within 
organizations. 

Principles of Marketing 

This course examines marketing's place in the firm and in society. 
Considered and analyzed are marketing research and strategies for 
product development, pricing, physical distribution and promotion, 
including personal selling, advertising, sales promotion and public 
relations. 

Microeconomics 

Theory of demand, production and costs, allocation of resources, 
product and factor pricing, income distribution, market failure, 
international economics, and comparative economic systems. 

Macroeconomics 

National income determination, money and banking, business 
cycles and economic fluctuations, monetary and fiscal policy, 
economic growth, and current microeconomic issues. 
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Table 2 - Syllabus for Concepts of Big Data Analytics 

Week 

Topic 

Schools 




A 

B 

c 

D 

E i 

1 

Data-Analytic Thinking 


Data Science and Data-Driven Decision Making 

X 

X 

X 

X 

X 

Data as a Strategic Asset 

Executive Firm Mentor Presentation 

X 

X 

X 

X 

X 

Data-Analytic Thinking 

X 

X 

X 

X 

X 

2 

Data Science Solutions to Business Problems 


The Data Mining Process 

X 

X 

X 


X 

Other Analytics Techniques 

X 

X 

X 



3 

Predictive Modeling 






Models 


X 

X 

X 

X 

Supervised Segmentation 


X 

X 

X 

X 

Visualizing Segmentations 


X 

X 

X 

X 

Trees 


X 

X 

X 

X 

Probability Estimation 






4 

Model Fitting 


Classification Using Mathematical Functions 


X 


X 

X 

Linear Discriminant Function 

X 





Regression 


X 


X 

X 

Logistic Regression 


X 


X 

X 

Non-linear Functions, Neural Networks 





X 

Principle Component Analysis 






5 

Overfitting 


Overfitting Examples 






Overfitting Avoidance 






Complexity Control 






6 

Similarity, Neighbors and Clusters 


Similarity and Distance 


X 

X 

X 

X 

Nearest Neighbor 


X 

X 

X 

X 

Clustering 


X 

X 

X 

X 
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7 

Decision-Analytic Thinking - Creating a Model 






Evaluating Classifiers 


X 




Generalizing Beyond Classification 






Expected Value 






Outlier Detection 






8 

Visualizing Model Performance 


Ranking 


X 

X 



Profit Curves 






ROC Graphs and Curves 





X 

Area Under the ROC Curve 





X 

Lift Curves 






9 

Evidence and Probabilities 


Combining Evidence Probabilistically 

X 

X 




Bayes Rule 

X 

X 




Evidence Lift 

X 





10 

Representing and Mining Text 


The Importance of Text 




X 


Text Representation 




X 


N-gram Sequences 




X 


Named Entity Extraction 




X 


Topic Models 




X 


11 

Decision-Analytic Thinking - Analytical 
Engineering 


Selection Bias 


X 




Expected Value Decomposition 


X 




12 

Other Data Science Techniques 


Co-occurrence and Associations 






Profiling 

Optional Scientist Firm Mentor Presentation 






Link Prediction 






Data Reduction 






Bias, Variance, the Ensemble Method 





X 

Data-Driven Causal Explanation 

Optional Scientist Firm Mentor Presentation 






Time Series 


X 
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13 

Data Science and Business Strategy 


Achieving Competitive Advantage with Data Science 




X 


Sustaining Competitive Advantage with Data Science 




X 


Attracting Data Scientists and Teams 

Optional Scientist Firm Mentor Presentation 




X 


Evaluating Data Science Proposals 




X 


The Kaggle Model 


X 




14 

Ethics and Data Science 


Data Security 

X 





Privacy 

X 

X 


X 


ACM Code of Ethics 
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